myeclipse8.5运行hadoop2.6.0程序
我的环境:Windows10 64位系统Hadoop2.6.0-cdh5.5.2(或者Apache版的hadoop2.6.0)安装在虚拟机LinuxRedhat5.5 64位系统上MyEclipse8.5jdk1.7.0_25一、软件安装及配置1、下载hadoop-eclipse-plugin-2.6.0.jar,并将复制到D:\MyEclipse 8.5\dr
前言:我的环境
- Windows10 64位系统
- Hadoop2.6.0-cdh5.5.2(或者Apache版的hadoop2.6.0)安装在虚拟机Linux Redhat5.5 64位系统上
- MyEclipse8.5
- jdk1.7.0_25
一、软件安装及配置
- 下载
hadoop-eclipse-plugin-2.6.0.jar
,并将复制到D:\MyEclipse 8.5\dropins
目录(根据你自己myeclipse的安装目录,网上有好多说是放在eclipse/plugins,但我没有这个目,应该是和eclipse版本有关吧)下,然后重启myeclipse。
hadoop-eclipse-plugin-2.6.0.jar下载地址:http://download.csdn.net/download/kuangshux/8770783
(CSDN有时候下载还要积分,无奈我花了10元的大洋,真坑)
- 打开map-reduce视图
在eclipse中,打开window——>open perspetive——>other,选择map/reduce。
- 选择
Map/Reduce Locations
标签页,新建一个Location
- 在
project exploer
中,可以浏览刚才定义站点的文件系统
- 准备测试数据,并上传到hdfs中。
[hadoop@h71 ~]$ hadoop fs -mkdir /input
[hadoop@h71 ~]$ vi he.txt
hello world
hello hadoop
hello hive
[hadoop@h71 ~]$ hadoop fs -put he.txt /input
- 准备map-reduce程序
在myeclipse中新建项目并新建WordCount.java,代码如下:
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class WordCount {
public static class Map extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
public static class Reduce extends MapReduceBase implements
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
- 运行程序
得先WordCount.java右击–>Run As–>Java Application,再点击WordCount.java右击–>Run As–>Run COnfigurations 设置输入和输出目录路径,如图所示:
然后再WordCount.java右击–>Run As–>Run on Hadoop
myeclipse默认自带的最高版本的jdk是1.6版本的,它会报这个错:
一开始我想装个最新版本的得了,去官网一看目前最新的是jdk1.8.0_131,于是我就装了个这个版本,结果运行后又报这个错:
后来百度有遇到我这样问题的The type java.util.Map$Entry cannot be resolved. It is indirectly referenced问题,结果还得装jdk7版本啊,于是我又去官网下载了jdk1.7.0_25(应该jdk7版本系列就行,因为我的虚拟机里的hadoop集群用的就是jdk1.7.0_25),详细的jdk安装步骤请参考详解win10JDK1.8安装与环境变量配置和WIN7 64位系统安装JDK并配置环境变量,在安装jdk1.7的时候去官网找了半天没找到,因为jdk1.7是历史版本,官网下载的时候只显示最新版本,去官网找历史版本请参考如何在官网下载java JDK的历史版本,在jdk官网下载jdk的时候得选择Accept License Agreement
,否则下载的时候弹出
(而且还让我去官网注册账户才让下载,设置密码的时候还必须有大小写字母数字啥的,一开始试了几个密码还不好使,都烦死了)
然后在myeclipse中配置jdk1.7,如果不会配的可以参考myeclipse中如何配置jdk1.7
现在我们引入jdk1.7,右键我们开发中的工程-》properties,选择Javabuild path-》libraries-》jre system library[javase-1.6]-》edit
。
然后选择jdk的安装目录C:\Program Files\Java\jdk1.7.0_25
二、接下来就是可能遇到的问题
参考地址:Windows下使用Hadoop2.6.0-eclipse-plugin插件
问题一:
Could not locate executable null\bin\winutils.exe in the Hadoop binaries
分析:下载Hadoop2以上版本时,在Hadoop2的bin目录下没有winutils.exe
解决:
-
先将hadoop-2.6.0-cdh5.5.2.tar.gz解压到桌面(我不太明白的是既然是myeclipse和虚拟机上的hadoop2.6通信,为什么还要在Windows上解压hadoop压缩包呢)
-
下载winutils.exe-其它其他资源-CSDN下载,复制到你解压的hadoop-2.6.0-cdh5.5.2的bin目录下。如图所示:
-
Eclipse-》window-》Preferences
下的Hadoop Map/Peduce
把下载放在我们的磁盘的Hadoop目录引进来,如图所示:
-
Hadoop2配置变量环境
HADOOP_HOME
和path,如图所示:
注:后来在Windows7中也测过一次,只配HADOOP_HOME
而不配path变量也可以成功。若暂时不宜重启电脑,可代码中设置:System.setProperty("hadoop.home.dir", "C:\\Users\\9\\Desktop\\临时文件夹\\hadoop-2.6.0");
还可参考:Could not locate executable null\bin\winutils.exe in the Hadoop binaries解决方式。
类似的代码参数配置:System.setProperty("HADOOP_USER_NAME", "hdfs");
问题二:
java.lang.Exception: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
分析:C:\Windows\System32
下缺少 hadoop.dll
,把这个文件拷贝到 C:\Windows\System32
下面即可。(网上找到解决方案是:由于 hadoop.dll
版本问题出现的,这是由于 hadoop.dll
版本问题,hadoop2.4
之前的和之后需要的不一样,需要选择正确的版本(包括操作系统的版本),并且在 Hadoop/bin
和 C:\windows\system32
上将其替换。)
解决:将压缩包中的hadoop.dll放到C:\Windows\System32
下,然后重启电脑(我没重启好像也好使),也许还没那么简单,还是出现这样的问题。如果这个还是没解决,最好在%HADOOP_HOME%/bin
目录下面也复制一份(我的只在C:\Windows\System32
复制就已经好使了,)。
注:hadoop.dll
文件可以在这个网站下载得到 https://codeload.github.com/s911415/apache-hadoop-3.1.3-winutils/zip/refs/heads/master
。
我们在继续分析:我们在出现错误的的atorg.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:557)
我们来看这个类NativeIO的557行,如图所示:
Windows的唯一方法用于检查当前进程的请求,在给定的路径的访问权限,所以我们先给以能进行访问,我们自己先修改源代码,return true
时允许访问。我们下载对应hadoop源代码,hadoop-2.6.0-src.tar.gz解压,hadoop-2.6.0-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeio
下NativeIO.java
复制到对应的Eclipse的project,然后修改557行为return true如图所示:
问题三:
分析:我们没权限访问HDFS目录。
解决:我们在这个etc/hadoop
下的hdfs-site.xml
添加
[hadoop@h71 ~]$ vi hadoop-2.6.0-cdh5.5.2/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
设置没有权限,不过我们在正式的服务器上不能这样设置。
注:我按上面的操作还是报这个错,于是我重启了hadoop集群,到hadoop的家目录执行sbin/stop-all.sh
和sbin/start-all.sh
命令。
三、所有问题解决之后运行程序成功,查看结果
Myeclipse运行结果:
17/04/26 16:04:25 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/04/26 16:04:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/04/26 16:04:26 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
17/04/26 16:04:26 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
17/04/26 16:04:30 INFO mapred.FileInputFormat: Total input paths to process : 1
17/04/26 16:04:31 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/04/26 16:04:31 INFO mapred.JobClient: Running job: job_local292222875_0001
17/04/26 16:04:31 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
17/04/26 16:04:31 INFO mapred.LocalJobRunner: Waiting for map tasks
17/04/26 16:04:31 INFO mapred.LocalJobRunner: Starting task: attempt_local292222875_0001_m_000000_0
17/04/26 16:04:31 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/04/26 16:04:31 INFO mapred.Task: Using ResourceCalculatorPlugin : null
17/04/26 16:04:31 INFO mapred.MapTask: Processing split: hdfs://192.168.8.71:9000/input/he.txt:0+36
17/04/26 16:04:31 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead
17/04/26 16:04:31 INFO mapred.MapTask: numReduceTasks: 1
17/04/26 16:04:31 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/04/26 16:04:31 INFO mapred.MapTask: io.sort.mb = 100
17/04/26 16:04:31 INFO mapred.MapTask: data buffer = 79691776/99614720
17/04/26 16:04:31 INFO mapred.MapTask: record buffer = 262144/327680
17/04/26 16:04:31 INFO mapred.MapTask: Starting flush of map output
17/04/26 16:04:31 INFO mapred.MapTask: Finished spill 0
17/04/26 16:04:31 INFO mapred.Task: Task:attempt_local292222875_0001_m_000000_0 is done. And is in the process of commiting
17/04/26 16:04:31 INFO mapred.LocalJobRunner: hdfs://192.168.8.71:9000/input/he.txt:0+36
17/04/26 16:04:31 INFO mapred.Task: Task 'attempt_local292222875_0001_m_000000_0' done.
17/04/26 16:04:31 INFO mapred.LocalJobRunner: Finishing task: attempt_local292222875_0001_m_000000_0
17/04/26 16:04:31 INFO mapred.LocalJobRunner: Map task executor complete.
17/04/26 16:04:31 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/04/26 16:04:31 INFO mapred.Task: Using ResourceCalculatorPlugin : null
17/04/26 16:04:31 INFO mapred.LocalJobRunner:
17/04/26 16:04:31 INFO mapred.Merger: Merging 1 sorted segments
17/04/26 16:04:31 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 50 bytes
17/04/26 16:04:31 INFO mapred.LocalJobRunner:
17/04/26 16:04:31 INFO mapred.Task: Task:attempt_local292222875_0001_r_000000_0 is done. And is in the process of commiting
17/04/26 16:04:31 INFO mapred.LocalJobRunner:
17/04/26 16:04:31 INFO mapred.Task: Task attempt_local292222875_0001_r_000000_0 is allowed to commit now
17/04/26 16:04:31 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local292222875_0001_r_000000_0' to hdfs://192.168.8.71:9000/output
17/04/26 16:04:31 INFO mapred.LocalJobRunner: reduce > reduce
17/04/26 16:04:31 INFO mapred.Task: Task 'attempt_local292222875_0001_r_000000_0' done.
17/04/26 16:04:32 INFO mapred.JobClient: map 100% reduce 100%
17/04/26 16:04:32 INFO mapred.JobClient: Job complete: job_local292222875_0001
17/04/26 16:04:32 INFO mapred.JobClient: Counters: 23
17/04/26 16:04:32 INFO mapred.JobClient: File System Counters
17/04/26 16:04:32 INFO mapred.JobClient: FILE: Number of bytes read=338
17/04/26 16:04:32 INFO mapred.JobClient: FILE: Number of bytes written=376680
17/04/26 16:04:32 INFO mapred.JobClient: FILE: Number of read operations=0
17/04/26 16:04:32 INFO mapred.JobClient: FILE: Number of large read operations=0
17/04/26 16:04:32 INFO mapred.JobClient: FILE: Number of write operations=0
17/04/26 16:04:32 INFO mapred.JobClient: HDFS: Number of bytes read=72
17/04/26 16:04:32 INFO mapred.JobClient: HDFS: Number of bytes written=32
17/04/26 16:04:32 INFO mapred.JobClient: HDFS: Number of read operations=12
17/04/26 16:04:32 INFO mapred.JobClient: HDFS: Number of large read operations=0
17/04/26 16:04:32 INFO mapred.JobClient: HDFS: Number of write operations=4
17/04/26 16:04:32 INFO mapred.JobClient: Map-Reduce Framework
17/04/26 16:04:32 INFO mapred.JobClient: Map input records=3
17/04/26 16:04:32 INFO mapred.JobClient: Map output records=6
17/04/26 16:04:32 INFO mapred.JobClient: Map output bytes=60
17/04/26 16:04:32 INFO mapred.JobClient: Input split bytes=90
17/04/26 16:04:32 INFO mapred.JobClient: Combine input records=6
17/04/26 16:04:32 INFO mapred.JobClient: Combine output records=4
17/04/26 16:04:32 INFO mapred.JobClient: Reduce input groups=4
17/04/26 16:04:32 INFO mapred.JobClient: Reduce shuffle bytes=0
17/04/26 16:04:32 INFO mapred.JobClient: Reduce input records=4
17/04/26 16:04:32 INFO mapred.JobClient: Reduce output records=4
17/04/26 16:04:32 INFO mapred.JobClient: Spilled Records=8
17/04/26 16:04:32 INFO mapred.JobClient: Total committed heap usage (bytes)=453640192
17/04/26 16:04:32 INFO mapred.JobClient: File Input Format Counters
17/04/26 16:04:32 INFO mapred.JobClient: Bytes Read=36
更多推荐
所有评论(0)