Tutorial 6: Hadoop MapReduce First Program

In earlier class we learned about Tutorial 5: Hadoop MapReduce. Checkout all articles in the Big Data FREE course here:

Big Data Course Syllabus

Tutorial	Introduction to BigData
Tutorial	Introduction to Hadoop Architecture, and Components
Tutorial	Hadoop installation on Windows
Tutorial	HDFS Read & Write Operation using Java API
Tutorial	Hadoop MapReduce
Tutorial	Hadoop MapReduce First Program
Tutorial	Hadoop MapReduce and Counter
Tutorial	Apache Sqoop
Tutorial	Apache Flume
Tutorial	Hadoop Pig
Tutorial	Apache Oozie
Tutorial	Big Data Testing

In this tutorial, we are going to write our first program in Hadoop MapReduce in order to understand the functionality in detail. Like any other computer program, Hadoop requires an input data which we are going to provide in the form of spreadsheet. The spreadsheet [ItemsSalesData.csv] as an input can have the following data fields to the sales deed.

Sales Date
Item name
Item price
Payment Method
Customer Name
Customer Residence City
Customer Residence Province
Customer Country

The end goal of Hadoop MapReduce program is to figure out the number of items Sold in each country specified for the customers in the spreadsheet [ItemsSalesData.csv].

ItemsSalesData Download

Step 1: First of all, you need to ensure that Hadoop has installed on your machine. To begin with the actual process, you need to change user to ‘hduser’ I.e. id used during Hadoop configuration. Later, you can change to the userid used for your Hadoop config.

Step 2: Do the following in order to create a folder with required permissions.

sudo mkdir HadoopMapReduceExample sudo chmod -R 777 HadoopMapReduceExample

Step 3: Write the MapReduce program for the following Java Classes and ensure the deployed binaries in the above folder have the read permission.

ItemsMapper.java

 import java.io.IOException;
  
 import org.apache.hadoop.io.IntWritable;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapred.OutputCollector;
 import org.apache.hadoop.mapred.Reporter;
  
 /**
  * 
  * @author STC
  *
  */
 public class ItemsMapper {
  
        private final static IntWritable one = new IntWritable(1);
  
        public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
  
               String valueString = value.toString();
               String[] SingleCountryData = valueString.split(",");
               output.collect(new Text(SingleCountryData[7]), one);
        }
 }

ItemsCountryReducer.java

 import java.io.IOException;
 import java.util.*;
  
 import org.apache.hadoop.io.IntWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapred.*;
  
 /**
  * 
  * @author STC
  *
  */
 public class ItemsCountryReducer {
        public void reduce (Text textKey, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output,
                      Reporter reporter) throws IOException {
               
               Text key = textKey;
               int frequencyForCountry = 0;
               while (values.hasNext()) {
                      // Replacement of the type of value with the actual type of value.
                      IntWritable value = (IntWritable) values.next ();
                      frequencyForCountry += value.get ();
  
               }
               output.collect (key, new IntWritable(frequencyForCountry));
        }
 }

ItemsCountryDriver.java

 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.IntWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapred.FileInputFormat;
 import org.apache.hadoop.mapred.FileOutputFormat;
 import org.apache.hadoop.mapred.JobClient;
 import org.apache.hadoop.mapred.JobConf;
 import org.apache.hadoop.mapred.Mapper;
 import org.apache.hadoop.mapred.Reducer;
 import org.apache.hadoop.mapred.TextInputFormat;
 import org.apache.hadoop.mapred.TextOutputFormat;
  
 /**
  * 
  * @author STC
  *
  */
 public class ItemsCountryDriver {
        
        public static void main (String [] args) {
               JobClient myJobclient = new JobClient ();
               // Creation of a configuration object for the Job
               JobConf jobConf = new JobConf (ItemsCountryDriver.class);
  
               // Setting a name of the Job
               jobConf.setJobName ("SalePerCountry");
  
               // Specify data type of output key and value
               jobConf.setOutputKeyClass(Text.class);
               jobConf.setOutputValueClass(IntWritable.class);
  
               // Specify names of Mapper and Reducer Class
               jobConf.setMapperClass((Class<? extends
 Mapper>) ItemsMapper.
class
);
               jobConf.setReducerClass((Class<? extends
 Reducer>) itemcountry. ItemsCountryReducer.class
);
  
               // Specifying the formats of the data type of Input and output.
               jobConf.setInputFormat(TextInputFormat.class);
               jobConf.setOutputFormat(TextOutputFormat.class);
  
               // Here, we need to Set input and output directories
               // arg [0] = represents the name of input directory on HDFS
               // arg [1] = represents the name of output directory on HDFS
  
               FileInputFormat.setInputPaths(jobConf, new Path(args [0]));
               FileOutputFormat.setOutputPath(jobConf, new Path(args [1]));
  
               myJobclient.setConf(jobConf);
  
               try {
                      // Execute the job
                      JobClient.runJob(jobConf);
               } catch (Exception exp) {
                      exp.printStackTrace ();
               }
        }
 }

Step 4: Export Class path as shown below. Above three java classes require the following runtime libraries and therefore, these paths need to be exported.

hadoop-mapreduce-client-core-3.2.0.jar
hadoop-mapreduce-client-common-3.2.0.jar
hadoop-common-3.2.0.jar
hadoop-mapred-0.22.jar

export CLASSPATH=”$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.2.0.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-common-3.2.0.jar:$HADOOP_HOME/share/hadoop/common/hadoop-common-3.2.0.jar:~/HadoopMapReduceExample/itemcountry/*:$HADOOP_HOME/lib/*”
Class path

Step 5: Compile the above Java files by using the following command. The compiled binaries i.e. class files will be put in the package directory.

javac -d . ItemsMapper.java ItemsCountryReducer.java ItemsCountryDriver.java
JAVA Classes compilation

The above compilation will create a directory in a current directory i.e. a directory inside HadoopMapReduceExample with name itemcountry I.e. the package name specified in the java source file and put all three compiled class files i.e. binaries in it.

Step 6: Create a new file Manifest.txt [sudo gedit Manifest.txt] and add the following files into it. It is nothing but the fully qualified name of the java main class. Don’t forget to hit enter key after adding the line.

Main-Class: itemcountry.ItemsCountryDriver
Manifest.txt

Step 7: Create a JAR file with the help of the following command.

jar cfm ItemSaleCountryWise.jar Manifest.txt itemcountry/*.class
JAR File Creation

Step 8: Start the Hadoop after executing the following commands.

$HADOOP_HOME/sbin/start-dfs.sh $HADOOP_HOME/sbin/start-yarn.sh
Hadoop Start Command

Step 9: Copy and paste the spreadsheet [ItemsSalesData.csv] which has data for the item sales country wise at location ~/inputMapReduce. Next, use the following command to copy ~/inputMapReduce to HDFS.

$HADOOP_HOME/bin/hdfs dfs –copy_from_local ~/inputMapReduce /

Step 10: After successful copying of the CSV spreadsheet [ItemsSalesData.csv] with Item sales data for countries, we need to run the MapReduce Job.

$HADOOP_HOME/bin/hadoop jar ItemSaleCountryWise.jar /inputMapReduce /mapreduce_output_item_sales
Execute MapReduce Job

It will create an output directory with the name as mapreduce_output_item_sales on HDFS. The directory content will be a file containing product sales per country. The result can be visible through command interface as given below.

$HADOOP_HOME/bin/hdfs dfs -cat /mapreduce_output_item_sales/part-00000
Result View

Similar the output result can be seen on the Hadoop web interface when the mapreduce_output_item_sales directory is browsed from the URL.

Conclusion

In this tutorial, we discussed about the environment setup and use of Hadoop MapReduce program to extract country wise item sales from the spreadsheet [ItemsSalesData.csv] with 8 columns in order to demonstrate the operation of Hadoop HDFS with MapReduce program.

>>> Checkout Big Data Tutorial List <<<

⇓ Subscribe Us ⇓

If you are not regular reader of this website then highly recommends you to Sign up for our free email newsletter!! Sign up just providing your email address below:

Happy Testing!!!

1 thought on “Tutorial 6: Hadoop MapReduce First Program”

Shreyash

December 4, 2019 at 11:55 pm

This is very nice tutorial and looking for more such sample program’s.
Reply

Share This Post

Download 200+ Software Testing Interview Questions and Answers PDF!!

Software Testing Class