前言:我的环境

  • Windows10 64位系统
  • Hadoop2.6.0-cdh5.5.2(或者Apache版的hadoop2.6.0)安装在虚拟机Linux Redhat5.5 64位系统上
  • MyEclipse8.5
  • jdk1.7.0_25

一、软件安装及配置

  1. 下载hadoop-eclipse-plugin-2.6.0.jar,并将复制到D:\MyEclipse 8.5\dropins目录(根据你自己myeclipse的安装目录,网上有好多说是放在eclipse/plugins,但我没有这个目,应该是和eclipse版本有关吧)下,然后重启myeclipse。

hadoop-eclipse-plugin-2.6.0.jar下载地址:http://download.csdn.net/download/kuangshux/8770783
(CSDN有时候下载还要积分,无奈我花了10元的大洋,真坑)

  1. 打开map-reduce视图
    在eclipse中,打开window——>open perspetive——>other,选择map/reduce。
    在这里插入图片描述
  2. 选择Map/Reduce Locations标签页,新建一个Location
    在这里插入图片描述
  3. project exploer中,可以浏览刚才定义站点的文件系统
    在这里插入图片描述
  4. 准备测试数据,并上传到hdfs中。
[hadoop@h71 ~]$ hadoop fs -mkdir /input
[hadoop@h71 ~]$ vi he.txt
hello world
hello hadoop
hello hive
[hadoop@h71 ~]$ hadoop fs -put he.txt /input
  1. 准备map-reduce程序
    在myeclipse中新建项目并新建WordCount.java,代码如下:
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf; 
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class WordCount {

    public static class Map extends MapReduceBase implements
            Mapper<LongWritable, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(LongWritable key, Text value,
                OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                output.collect(word, one);
            }
        }
    }

    public static class Reduce extends MapReduceBase implements
            Reducer<Text, IntWritable, Text, IntWritable> {
        public void reduce(Text key, Iterator<IntWritable> values,
                OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {
            int sum = 0;
            while (values.hasNext()) {
                sum += values.next().get();
            }
            output.collect(key, new IntWritable(sum));
        }
    }

    public static void main(String[] args) throws Exception {
        JobConf conf = new JobConf(WordCount.class);
        conf.setJobName("wordcount");

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(Map.class);
        conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf);
    }
}
  1. 运行程序
    得先WordCount.java右击–>Run As–>Java Application,再点击WordCount.java右击–>Run As–>Run COnfigurations 设置输入和输出目录路径,如图所示:
    在这里插入图片描述
    然后再WordCount.java右击–>Run As–>Run on Hadoop

  myeclipse默认自带的最高版本的jdk是1.6版本的,它会报这个错:
在这里插入图片描述
  一开始我想装个最新版本的得了,去官网一看目前最新的是jdk1.8.0_131,于是我就装了个这个版本,结果运行后又报这个错:
在这里插入图片描述
  后来百度有遇到我这样问题的The type java.util.Map$Entry cannot be resolved. It is indirectly referenced问题,结果还得装jdk7版本啊,于是我又去官网下载了jdk1.7.0_25(应该jdk7版本系列就行,因为我的虚拟机里的hadoop集群用的就是jdk1.7.0_25),详细的jdk安装步骤请参考详解win10JDK1.8安装与环境变量配置WIN7 64位系统安装JDK并配置环境变量,在安装jdk1.7的时候去官网找了半天没找到,因为jdk1.7是历史版本,官网下载的时候只显示最新版本,去官网找历史版本请参考如何在官网下载java JDK的历史版本,在jdk官网下载jdk的时候得选择Accept License Agreement,否则下载的时候弹出
在这里插入图片描述
(而且还让我去官网注册账户才让下载,设置密码的时候还必须有大小写字母数字啥的,一开始试了几个密码还不好使,都烦死了)

  然后在myeclipse中配置jdk1.7,如果不会配的可以参考myeclipse中如何配置jdk1.7

  现在我们引入jdk1.7,右键我们开发中的工程-》properties,选择Javabuild path-》libraries-》jre system library[javase-1.6]-》edit
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
  然后选择jdk的安装目录C:\Program Files\Java\jdk1.7.0_25
 

二、接下来就是可能遇到的问题

参考地址:Windows下使用Hadoop2.6.0-eclipse-plugin插件

问题一:

Could not locate executable null\bin\winutils.exe in the Hadoop binaries
在这里插入图片描述
分析:下载Hadoop2以上版本时,在Hadoop2的bin目录下没有winutils.exe

解决:

  1. 先将hadoop-2.6.0-cdh5.5.2.tar.gz解压到桌面(我不太明白的是既然是myeclipse和虚拟机上的hadoop2.6通信,为什么还要在Windows上解压hadoop压缩包呢)

  2. 下载winutils.exe-其它其他资源-CSDN下载,复制到你解压的hadoop-2.6.0-cdh5.5.2的bin目录下。如图所示:
    在这里插入图片描述

  3. Eclipse-》window-》Preferences下的Hadoop Map/Peduce把下载放在我们的磁盘的Hadoop目录引进来,如图所示:
    在这里插入图片描述

  4. Hadoop2配置变量环境HADOOP_HOME和path,如图所示:
    在这里插入图片描述
    在这里插入图片描述
    注:后来在Windows7中也测过一次,只配HADOOP_HOME而不配path变量也可以成功。若暂时不宜重启电脑,可代码中设置:System.setProperty("hadoop.home.dir", "C:\\Users\\9\\Desktop\\临时文件夹\\hadoop-2.6.0");
    还可参考:Could not locate executable null\bin\winutils.exe in the Hadoop binaries解决方式
    类似的代码参数配置:System.setProperty("HADOOP_USER_NAME", "hdfs");

问题二:

java.lang.Exception: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
在这里插入图片描述
分析:C:\Windows\System32 下缺少 hadoop.dll,把这个文件拷贝到 C:\Windows\System32 下面即可。(网上找到解决方案是:由于 hadoop.dll 版本问题出现的,这是由于 hadoop.dll 版本问题,hadoop2.4 之前的和之后需要的不一样,需要选择正确的版本(包括操作系统的版本),并且在 Hadoop/binC:\windows\system32 上将其替换。)

解决:将压缩包中的hadoop.dll放到C:\Windows\System32下,然后重启电脑(我没重启好像也好使),也许还没那么简单,还是出现这样的问题。如果这个还是没解决,最好在%HADOOP_HOME%/bin目录下面也复制一份(我的只在C:\Windows\System32复制就已经好使了,)。
注:hadoop.dll 文件可以在这个网站下载得到 https://codeload.github.com/s911415/apache-hadoop-3.1.3-winutils/zip/refs/heads/master

我们在继续分析:我们在出现错误的的atorg.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:557)我们来看这个类NativeIO的557行,如图所示:
在这里插入图片描述
  Windows的唯一方法用于检查当前进程的请求,在给定的路径的访问权限,所以我们先给以能进行访问,我们自己先修改源代码,return true时允许访问。我们下载对应hadoop源代码,hadoop-2.6.0-src.tar.gz解压,hadoop-2.6.0-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeioNativeIO.java复制到对应的Eclipse的project,然后修改557行为return true如图所示:
在这里插入图片描述
 

问题三:

在这里插入图片描述
分析:我们没权限访问HDFS目录。

解决:我们在这个etc/hadoop下的hdfs-site.xml添加

[hadoop@h71 ~]$ vi hadoop-2.6.0-cdh5.5.2/etc/hadoop/hdfs-site.xml 
  <property> 
     <name>dfs.permissions</name> 
     <value>false</value> 
  </property>

设置没有权限,不过我们在正式的服务器上不能这样设置。

注:我按上面的操作还是报这个错,于是我重启了hadoop集群,到hadoop的家目录执行sbin/stop-all.shsbin/start-all.sh命令。

三、所有问题解决之后运行程序成功,查看结果

在这里插入图片描述

Myeclipse运行结果:

17/04/26 16:04:25 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/04/26 16:04:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/04/26 16:04:26 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
17/04/26 16:04:26 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
17/04/26 16:04:30 INFO mapred.FileInputFormat: Total input paths to process : 1
17/04/26 16:04:31 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/04/26 16:04:31 INFO mapred.JobClient: Running job: job_local292222875_0001
17/04/26 16:04:31 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
17/04/26 16:04:31 INFO mapred.LocalJobRunner: Waiting for map tasks
17/04/26 16:04:31 INFO mapred.LocalJobRunner: Starting task: attempt_local292222875_0001_m_000000_0
17/04/26 16:04:31 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/04/26 16:04:31 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
17/04/26 16:04:31 INFO mapred.MapTask: Processing split: hdfs://192.168.8.71:9000/input/he.txt:0+36
17/04/26 16:04:31 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and  BYTES_READ as counter name instead
17/04/26 16:04:31 INFO mapred.MapTask: numReduceTasks: 1
17/04/26 16:04:31 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/04/26 16:04:31 INFO mapred.MapTask: io.sort.mb = 100
17/04/26 16:04:31 INFO mapred.MapTask: data buffer = 79691776/99614720
17/04/26 16:04:31 INFO mapred.MapTask: record buffer = 262144/327680
17/04/26 16:04:31 INFO mapred.MapTask: Starting flush of map output
17/04/26 16:04:31 INFO mapred.MapTask: Finished spill 0
17/04/26 16:04:31 INFO mapred.Task: Task:attempt_local292222875_0001_m_000000_0 is done. And is in the process of commiting
17/04/26 16:04:31 INFO mapred.LocalJobRunner: hdfs://192.168.8.71:9000/input/he.txt:0+36
17/04/26 16:04:31 INFO mapred.Task: Task 'attempt_local292222875_0001_m_000000_0' done.
17/04/26 16:04:31 INFO mapred.LocalJobRunner: Finishing task: attempt_local292222875_0001_m_000000_0
17/04/26 16:04:31 INFO mapred.LocalJobRunner: Map task executor complete.
17/04/26 16:04:31 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
17/04/26 16:04:31 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
17/04/26 16:04:31 INFO mapred.LocalJobRunner: 
17/04/26 16:04:31 INFO mapred.Merger: Merging 1 sorted segments
17/04/26 16:04:31 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 50 bytes
17/04/26 16:04:31 INFO mapred.LocalJobRunner: 
17/04/26 16:04:31 INFO mapred.Task: Task:attempt_local292222875_0001_r_000000_0 is done. And is in the process of commiting
17/04/26 16:04:31 INFO mapred.LocalJobRunner: 
17/04/26 16:04:31 INFO mapred.Task: Task attempt_local292222875_0001_r_000000_0 is allowed to commit now
17/04/26 16:04:31 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local292222875_0001_r_000000_0' to hdfs://192.168.8.71:9000/output
17/04/26 16:04:31 INFO mapred.LocalJobRunner: reduce > reduce
17/04/26 16:04:31 INFO mapred.Task: Task 'attempt_local292222875_0001_r_000000_0' done.
17/04/26 16:04:32 INFO mapred.JobClient:  map 100% reduce 100%
17/04/26 16:04:32 INFO mapred.JobClient: Job complete: job_local292222875_0001
17/04/26 16:04:32 INFO mapred.JobClient: Counters: 23
17/04/26 16:04:32 INFO mapred.JobClient:   File System Counters
17/04/26 16:04:32 INFO mapred.JobClient:     FILE: Number of bytes read=338
17/04/26 16:04:32 INFO mapred.JobClient:     FILE: Number of bytes written=376680
17/04/26 16:04:32 INFO mapred.JobClient:     FILE: Number of read operations=0
17/04/26 16:04:32 INFO mapred.JobClient:     FILE: Number of large read operations=0
17/04/26 16:04:32 INFO mapred.JobClient:     FILE: Number of write operations=0
17/04/26 16:04:32 INFO mapred.JobClient:     HDFS: Number of bytes read=72
17/04/26 16:04:32 INFO mapred.JobClient:     HDFS: Number of bytes written=32
17/04/26 16:04:32 INFO mapred.JobClient:     HDFS: Number of read operations=12
17/04/26 16:04:32 INFO mapred.JobClient:     HDFS: Number of large read operations=0
17/04/26 16:04:32 INFO mapred.JobClient:     HDFS: Number of write operations=4
17/04/26 16:04:32 INFO mapred.JobClient:   Map-Reduce Framework
17/04/26 16:04:32 INFO mapred.JobClient:     Map input records=3
17/04/26 16:04:32 INFO mapred.JobClient:     Map output records=6
17/04/26 16:04:32 INFO mapred.JobClient:     Map output bytes=60
17/04/26 16:04:32 INFO mapred.JobClient:     Input split bytes=90
17/04/26 16:04:32 INFO mapred.JobClient:     Combine input records=6
17/04/26 16:04:32 INFO mapred.JobClient:     Combine output records=4
17/04/26 16:04:32 INFO mapred.JobClient:     Reduce input groups=4
17/04/26 16:04:32 INFO mapred.JobClient:     Reduce shuffle bytes=0
17/04/26 16:04:32 INFO mapred.JobClient:     Reduce input records=4
17/04/26 16:04:32 INFO mapred.JobClient:     Reduce output records=4
17/04/26 16:04:32 INFO mapred.JobClient:     Spilled Records=8
17/04/26 16:04:32 INFO mapred.JobClient:     Total committed heap usage (bytes)=453640192
17/04/26 16:04:32 INFO mapred.JobClient:   File Input Format Counters 
17/04/26 16:04:32 INFO mapred.JobClient:     Bytes Read=36
Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐