在Ubuntu中安装Hadoop(本地单机版)
Hadoop本地安装Hadoop有多种安装方式,本文讲述Hadoop本地安装方法。一、软件准备需要安装好虚拟机Vmware,本文使用的是Ubuntu16。*下载JDK1.8下载地址:http://www.oracle.com/technetwork/java/javase/downloads/index.html*下载Openstack3.0下载地址:h...
Hadoop本地安装
Hadoop有多种安装方式,本文讲述Hadoop本地安装方法。
一、软件准备
需要安装好虚拟机Vmware12,本文使用的是Ubuntu16。
* 下载JDK1.8
必须安装sun JDK,下载地址:
http://www.oracle.com/technetwork/java/javase/downloads/index.html
* 下载Hadoop3.0
下载地址:http://hadoop.apache.org/releases.html
二、安装和配置
2.1 安装JDK1.8
解压JDK安装包到~/soft/目录下,然后将目录拷贝到/usr/soft/目录下:mv jdk1.8.0_111 /usr/soft/
环境变量配置:
打开/etc/environment:
JAVA_HOME=/usr/soft/jdk1.8.0_111
系统本来自带open-JDK,需要对JDK进行配置,具体请看:
https://blog.csdn.net/goodmentc/article/details/80959686
2.2 安装Hadoop
安装Hadoop:
将Hadoop安装包解压到/home/tc/soft/目录下,然后将目录拷贝到/usr/soft/目录下:mv hadoop-3.0.3 /usr/soft/
HADOOP_INSTALL=/home/tc/soft/hadoop-3.0.3
PATH=”/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/soft/jdk1.8/bin:/home/tc/soft/hadoop-3.0.3/bin:/home/tc/soft/hadoop-3.0.3/sbin”
打开终端执行命令使用环境变量生效:
source environment
查看hadoop安装是否成功:
执行命令:hadoop version
如果安装成功会看到hadoop版本信息:
tc@ubuntu:/usr/soft/hadoop-3.0.3$ hadoop version
Hadoop 3.0.3
Source code repository https://yjzhangal@git-wip-us.apache.org/repos/asf/hadoop.git -r 37fd7d752db73d984dc31e0cdfd590d252f5e075
Compiled by yzhang on 2018-05-31T17:12Z
Compiled with protoc 2.5.0
From source with checksum 736cdcefa911261ad56d2d120bf1fa
This command was run using /usr/soft/hadoop-3.0.3/share/hadoop/common/hadoop-common-3.0.3.jar
如果JDK环境变量配置未生效,则报错:
tc@ubuntu:/usr/soft/hadoop-3.0.3$ hadoop version
Error JAVA_HOME is not set and could not be found.
解决方法:重启虚拟机。
tc@ubuntu:~$ hadoop version
Hadoop 3.0.3
Source code repository https://yjzhangal@git-wip-us.apache.org/repos/asf/hadoop.git -r 37fd7d752db73d984dc31e0cdfd590d252f5e075
Compiled by yzhang on 2018-05-31T17:12Z
Compiled with protoc 2.5.0
From source with checksum 736cdcefa911261ad56d2d120bf1fa
This command was run using /usr/soft/hadoop-3.0.3/share/hadoop/common/hadoop-common-3.0.3.jar
三、应用
Hadoop自带了一个MapReduce程序$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar,它作为一个例子提供了MapReduce的基本功能,并且可以用于计算,包括 wordcount、terasort、join、grep 等。
以通过执行如下命令查看该.jar文件支持哪些MapReduce功能。
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar
tc@ubuntu:~$ hadoop jar /usr/soft/hadoop-3.0.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
tc@ubuntu:~$
3.1 grep使用
创建目录input,在input目录下创建文件test.txt, 执行命令:
tancan@ubuntu:~$ hadoop jar /usr/soft/hadoop-3.0.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar grep input output "test"
执行成功后,会打印相关日志:
... ...
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=838860800
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=98
File Output Format Counters
Bytes Written=8
查看结果:
- 执行命令:ll output
tancan@ubuntu:~$ ll output/
total 16
drwxr-xr-x 2 tancan tancan 4096 Jul 7 06:02 ./
drwxr-xr-x 25 tancan tancan 4096 Jul 7 06:02 ../
-rw-r--r-- 1 tancan tancan 0 Jul 7 06:02 part-r-00000
-rw-r--r-- 1 tancan tancan 8 Jul 7 06:02 .part-r-00000.crc
-rw-r--r-- 1 tancan tancan 0 Jul 7 06:02 _SUCCESS
-rw-r--r-- 1 tancan tancan 8 Jul 7 06:02 ._SUCCESS.crc
- 打印结果:cat output/*
tancan@ubuntu:~$ cat output/*
1 test
3.2 wordcount应用
修改input/test.txt内容:This is test file
执行命令:
tancan@ubuntu:~$ hadoop jar /usr/soft/hadoop-3.0.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar wordcount input output/
2018-07-07 06:08:24,179 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2018-07-07 06:08:24,249 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2018-07-07 06:08:24,249 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2018-07-07 06:08:24,532 INFO input.FileInputFormat: Total input files to process : 1
2018-07-07 06:08:24,553 INFO mapreduce.JobSubmitter: number of splits:1
2018-07-07 06:08:24,716 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1885415322_0001
2018-07-07 06:08:24,719 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-07-07 06:08:24,853 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2018-07-07 06:08:24,854 INFO mapreduce.Job: Running job: job_local1885415322_0001
2018-07-07 06:08:24,855 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2018-07-07 06:08:24,863 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2018-07-07 06:08:24,863 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2018-07-07 06:08:24,865 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2018-07-07 06:08:24,916 INFO mapred.LocalJobRunner: Waiting for map tasks
2018-07-07 06:08:24,916 INFO mapred.LocalJobRunner: Starting task: attempt_local1885415322_0001_m_000000_0
2018-07-07 06:08:24,954 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2018-07-07 06:08:24,955 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2018-07-07 06:08:24,978 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2018-07-07 06:08:24,985 INFO mapred.MapTask: Processing split: file:/home/tancan/input/test.txt:0+18
2018-07-07 06:08:25,082 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2018-07-07 06:08:25,083 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2018-07-07 06:08:25,083 INFO mapred.MapTask: soft limit at 83886080
2018-07-07 06:08:25,083 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2018-07-07 06:08:25,083 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2018-07-07 06:08:25,089 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2018-07-07 06:08:25,102 INFO mapred.LocalJobRunner:
2018-07-07 06:08:25,102 INFO mapred.MapTask: Starting flush of map output
2018-07-07 06:08:25,102 INFO mapred.MapTask: Spilling map output
2018-07-07 06:08:25,102 INFO mapred.MapTask: bufstart = 0; bufend = 34; bufvoid = 104857600
2018-07-07 06:08:25,102 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214384(104857536); length = 13/6553600
2018-07-07 06:08:25,121 INFO mapred.MapTask: Finished spill 0
2018-07-07 06:08:25,135 INFO mapred.Task: Task:attempt_local1885415322_0001_m_000000_0 is done. And is in the process of committing
2018-07-07 06:08:25,137 INFO mapred.LocalJobRunner: map
2018-07-07 06:08:25,138 INFO mapred.Task: Task 'attempt_local1885415322_0001_m_000000_0' done.
2018-07-07 06:08:25,144 INFO mapred.Task: Final Counters for attempt_local1885415322_0001_m_000000_0: Counters: 18
File System Counters
FILE: Number of bytes read=316165
FILE: Number of bytes written=784641
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=1
Map output records=4
Map output bytes=34
Map output materialized bytes=48
Input split bytes=97
Combine input records=4
Combine output records=4
Spilled Records=4
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
Total committed heap usage (bytes)=216006656
File Input Format Counters
Bytes Read=18
2018-07-07 06:08:25,145 INFO mapred.LocalJobRunner: Finishing task: attempt_local1885415322_0001_m_000000_0
2018-07-07 06:08:25,145 INFO mapred.LocalJobRunner: map task executor complete.
2018-07-07 06:08:25,149 INFO mapred.LocalJobRunner: Waiting for reduce tasks
2018-07-07 06:08:25,150 INFO mapred.LocalJobRunner: Starting task: attempt_local1885415322_0001_r_000000_0
2018-07-07 06:08:25,166 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2018-07-07 06:08:25,166 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2018-07-07 06:08:25,166 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2018-07-07 06:08:25,170 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@4e822838
2018-07-07 06:08:25,172 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
2018-07-07 06:08:25,200 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=322594400, maxSingleShuffleLimit=80648600, mergeThreshold=212912320, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2018-07-07 06:08:25,202 INFO reduce.EventFetcher: attempt_local1885415322_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2018-07-07 06:08:25,243 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1885415322_0001_m_000000_0 decomp: 44 len: 48 to MEMORY
2018-07-07 06:08:25,249 INFO reduce.InMemoryMapOutput: Read 44 bytes from map-output for attempt_local1885415322_0001_m_000000_0
2018-07-07 06:08:25,250 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 44, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->44
2018-07-07 06:08:25,251 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
2018-07-07 06:08:25,252 INFO mapred.LocalJobRunner: 1 / 1 copied.
2018-07-07 06:08:25,256 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2018-07-07 06:08:25,267 INFO mapred.Merger: Merging 1 sorted segments
2018-07-07 06:08:25,268 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 37 bytes
2018-07-07 06:08:25,269 INFO reduce.MergeManagerImpl: Merged 1 segments, 44 bytes to disk to satisfy reduce memory limit
2018-07-07 06:08:25,270 INFO reduce.MergeManagerImpl: Merging 1 files, 48 bytes from disk
2018-07-07 06:08:25,271 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
2018-07-07 06:08:25,271 INFO mapred.Merger: Merging 1 sorted segments
2018-07-07 06:08:25,273 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 37 bytes
2018-07-07 06:08:25,273 INFO mapred.LocalJobRunner: 1 / 1 copied.
2018-07-07 06:08:25,279 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2018-07-07 06:08:25,280 INFO mapred.Task: Task:attempt_local1885415322_0001_r_000000_0 is done. And is in the process of committing
2018-07-07 06:08:25,281 INFO mapred.LocalJobRunner: 1 / 1 copied.
2018-07-07 06:08:25,282 INFO mapred.Task: Task attempt_local1885415322_0001_r_000000_0 is allowed to commit now
2018-07-07 06:08:25,283 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1885415322_0001_r_000000_0' to file:/home/tancan/output
2018-07-07 06:08:25,284 INFO mapred.LocalJobRunner: reduce > reduce
2018-07-07 06:08:25,284 INFO mapred.Task: Task 'attempt_local1885415322_0001_r_000000_0' done.
2018-07-07 06:08:25,285 INFO mapred.Task: Final Counters for attempt_local1885415322_0001_r_000000_0: Counters: 24
File System Counters
FILE: Number of bytes read=316293
FILE: Number of bytes written=784727
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Combine input records=0
Combine output records=0
Reduce input groups=4
Reduce shuffle bytes=48
Reduce input records=4
Reduce output records=4
Spilled Records=4
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=216006656
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Output Format Counters
Bytes Written=38
2018-07-07 06:08:25,286 INFO mapred.LocalJobRunner: Finishing task: attempt_local1885415322_0001_r_000000_0
2018-07-07 06:08:25,286 INFO mapred.LocalJobRunner: reduce task executor complete.
2018-07-07 06:08:25,861 INFO mapreduce.Job: Job job_local1885415322_0001 running in uber mode : false
2018-07-07 06:08:25,864 INFO mapreduce.Job: map 100% reduce 100%
2018-07-07 06:08:25,866 INFO mapreduce.Job: Job job_local1885415322_0001 completed successfully
2018-07-07 06:08:25,883 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=632458
FILE: Number of bytes written=1569368
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=1
Map output records=4
Map output bytes=34
Map output materialized bytes=48
Input split bytes=97
Combine input records=4
Combine output records=4
Reduce input groups=4
Reduce shuffle bytes=48
Reduce input records=4
Reduce output records=4
Spilled Records=8
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=432013312
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=18
File Output Format Counters
Bytes Written=38
查看结果:
tancan@ubuntu:~$ cat output/*
This 1
file 1
is 1
test 1
tancan@ubuntu:~$
四、安装过程遇到的问题
4.1 在64位系统上安装后,运行报错
tancan@ubuntu:~$ hadoop fs -ls /
Java HotSpot(TM) Server VM warning: You have loaded library /usr/soft/hadoop-3.0.3/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
2018-06-28 06:52:31,341 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: Call From ubuntu/127.0.1.1 to master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
原因:
因为官网提供的版本本地库是32位的,在64位主机环境下无法执行。需要下载hadoop源码进行编译,编译成功后,找到native下的文件拷贝到${HADOOP_HOME}/lib/native目录下即可。
五、附件
我使用的native文件:
https://download.csdn.net/download/goodmentc/10528791
更多推荐
所有评论(0)