How to create and run Eclipse Project with a MapReduce Sample

This tutorial demonstrates how you can create and run MapReduce sample project with Eclipse IDE. It does not discuss details of actual code. For that you should visit official Apache Hadoop website.

https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

Note: To install Eclipse for Java on Lubuntu or Ubuntu environment, visit below link.

http://shabdar.org/hadoop-java/137-how-to-install-eclipse-for-java-in-lubuntu-or-ubuntu.html

Steps

Open Eclipse and create new Java Project

File -> New -> Java Project

Click Next and then Finish

Right click on WordCount Project under Package Explorer and go to properties in Eclipse IDE..

Select all jar files from below folders and add them to project. Below paths are based on my installation of hadoop. This may change depending on where you installed hadoop. My hadoop installation directory is /usr/local/hadoop. You should change this path according to your installation.

/usr/local/hadoop/share/Hadoop/mapreduce

/usr/local/hadoop/share/hadoop/common/lib

Hadoop WordCount Example

 

Right click on Project and Add new Class as WordCount.java

Copy Paste below code in this .java file. This code snippet is taken from official Apache Hadoop website. You should probably get latest code from this link instead of copying from here. There may be new version there.

https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

----------------WordCount.java Start--------------------

import java.io.IOException;

import java.util.StringTokenizer;

 

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 

public class WordCount {

 

  public static class TokenizerMapper

       extends Mapper<Object, Text, Text, IntWritable>{

 

    private final static IntWritable one = new IntWritable(1);

    private Text word = new Text();

 

    public void map(Object key, Text value, Context context

                    ) throws IOException, InterruptedException {

      StringTokenizer itr = new StringTokenizer(value.toString());

      while (itr.hasMoreTokens()) {

        word.set(itr.nextToken());

        context.write(word, one);

      }

    }

  }

 

  public static class IntSumReducer

       extends Reducer<Text,IntWritable,Text,IntWritable> {

    private IntWritable result = new IntWritable();

 

    public void reduce(Text key, Iterable<IntWritable> values,

                       Context context

                       ) throws IOException, InterruptedException {

      int sum = 0;

      for (IntWritable val : values) {

        sum += val.get();

      }

      result.set(sum);

      context.write(key, result);

    }

  }

 

  public static void main(String[] args) throws Exception {

    Configuration conf = new Configuration();

    Job job = Job.getInstance(conf, "word count");

    job.setJarByClass(WordCount.class);

    job.setMapperClass(TokenizerMapper.class);

    job.setCombinerClass(IntSumReducer.class);

    job.setReducerClass(IntSumReducer.class);

    job.setOutputKeyClass(Text.class);

    job.setOutputValueClass(IntWritable.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));

    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    System.exit(job.waitForCompletion(true) ? 0 : 1);

  }

}

 

----------------WordCount.java End--------------------

Make sure there are no errors in code after paste. Then Right click on Project and Export JAR file.

Select Jar file option under Java.

Click Next. Uncheck all other resources. Then provide path for exporting .jar file. It could be any path but remember where you exported it. You need this path for running jar file later.

Keep options selected as below.

Click Next and Finish.

Once exported, it is time to run JAR file. But before that, create input folder on HDFS that will hold one or more text files for counting words.

hadoop fs -mkdir input

Create a text file with some text and copy it in Input folder. You can create multiple text files if you wish.

vi wordcounttest.txt

Add  some sample test in this file and save it. Under Ubuntu you can also use nano editor which is easier than vi editor.

hadoop fs -put wordcounttest.txt input/wordcounttest.txt

Make sure output directory do not exist yet. If it already exists then remove it with below command, otherwise JAR will throw an error that directory already exist.

hadoop fs -rm -r output

Now we can run JAR file with command as below. Please note you may need to adjust path of jar file depending on where you exported it.

hadoop jar /home/shabdar/workspace/WordCount/WordCount.jar WordCount input/ output/

Please note that this program will read all text files from input folder on HDFS and count total number of words in each file. Output is stored under output folder on HDFS.

Once run is successful, you should see below 2 files in output folder.

hadoop fs -ls output

-rw-r--r--   1 hduser supergroup          0 2016-07-15 13:54 output/_SUCCESS

-rw-r--r--   1 hduser supergroup         73 2016-07-15 13:54 output/part-r-00000

_SUCCESS file is an empty file indicating RUN was successful. To see output of run, view part-r-00000 file.

hdfs fs -cat output/part-r-00000

Sample Output

hi            2

is             2

test        1