Hadoop - WordCount result not written to output file

I am trying to run a program to count the number of words with their frequency by following the steps given in this link: http://developer.yahoo.com/hadoop/tutorial/module3.html

I uploaded one directory called input , which includes three text files.

I managed to configure everything correctly. Now when I run WordCount.java, I do not see anything in the part-00000 file inside the output directory.

Java Code for Mapper:

import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Writable; import org.apache.hadoop.io.WritableComparable; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reporter; public class WordCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); while(itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } @Override public void map(LongWritable arg0, Text arg1, OutputCollector<Text, IntWritable> arg2, Reporter arg3) throws IOException { // TODO Auto-generated method stub } } 

Abbreviation Code:

 public class WordCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { //System.out.println(values.next()); IntWritable value = (IntWritable) values.next(); sum += value.get(); // process value } output.collect(key, new IntWritable(sum)); } } 

Code for Word counter:

 public class Counter { public static void main(String[] args) { JobClient client = new JobClient(); JobConf conf = new JobConf(com.example.Counter.class); // TODO: specify output types conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); // TODO: specify input and output DIRECTORIES (not files) conf.setInputPath(new Path("src")); conf.setOutputPath(new Path("out")); // TODO: specify a mapper conf.setMapperClass(org.apache.hadoop.mapred.lib.IdentityMapper.class); // TODO: specify a reducer conf .setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class); client.setConf(conf); try { JobClient.runJob(conf); } catch (Exception e) { e.printStackTrace(); } } } 

In the console, I get these logs:

 13/09/10 10:09:20 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/09/10 10:09:20 INFO mapred.FileInputFormat: Total input paths to process : 3 13/09/10 10:09:20 INFO mapred.FileInputFormat: Total input paths to process : 3 13/09/10 10:09:20 INFO mapred.JobClient: Running job: job_201309100855_0012 13/09/10 10:09:21 INFO mapred.JobClient: map 0% reduce 0% 13/09/10 10:09:25 INFO mapred.JobClient: map 25% reduce 0% 13/09/10 10:09:26 INFO mapred.JobClient: map 75% reduce 0% 13/09/10 10:09:27 INFO mapred.JobClient: map 100% reduce 0% 13/09/10 10:09:35 INFO mapred.JobClient: Job complete: job_201309100855_0012 13/09/10 10:09:35 INFO mapred.JobClient: Counters: 15 13/09/10 10:09:35 INFO mapred.JobClient: File Systems 13/09/10 10:09:35 INFO mapred.JobClient: HDFS bytes read=54049 13/09/10 10:09:35 INFO mapred.JobClient: Local bytes read=14 13/09/10 10:09:35 INFO mapred.JobClient: Local bytes written=214 13/09/10 10:09:35 INFO mapred.JobClient: Job Counters 13/09/10 10:09:35 INFO mapred.JobClient: Launched reduce tasks=1 13/09/10 10:09:35 INFO mapred.JobClient: Launched map tasks=4 13/09/10 10:09:35 INFO mapred.JobClient: Data-local map tasks=4 13/09/10 10:09:35 INFO mapred.JobClient: Map-Reduce Framework 13/09/10 10:09:35 INFO mapred.JobClient: Reduce input groups=0 13/09/10 10:09:35 INFO mapred.JobClient: Combine output records=0 13/09/10 10:09:35 INFO mapred.JobClient: Map input records=326 13/09/10 10:09:35 INFO mapred.JobClient: Reduce output records=0 13/09/10 10:09:35 INFO mapred.JobClient: Map output bytes=0 13/09/10 10:09:35 INFO mapred.JobClient: Map input bytes=50752 13/09/10 10:09:35 INFO mapred.JobClient: Combine input records=0 13/09/10 10:09:35 INFO mapred.JobClient: Map output records=0 13/09/10 10:09:35 INFO mapred.JobClient: Reduce input records=0 

I am new to Hadoop.

Please respond with an appropriate response.

Thanks.

+4
source share
2 answers

You have two map methods in your Mapper class. Whoever has the @Override annotation is a method that actually becomes overridden, and this method does nothing. Thus, nothing comes out of your cartographer, and nothing happens in the gearbox, and therefore there is no way out.

Remove the map method marked with the @Override annotation, and mark the first map method @Override . Then fix any problems with the method signature, and it should work.

+4
source

I ran into the same problem. I resolved it by removing the overridden map method and changing the signature of the map method with the first argument equal to LongWritable . Update the signature of the map method as shown below:

 @Override public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException { 
0
source

Source: https://habr.com/ru/post/1501401/


All Articles