I am trying to run a program to count the number of words with their frequency by following the steps given in this link: http://developer.yahoo.com/hadoop/tutorial/module3.html
I uploaded one directory called input , which includes three text files.
I managed to configure everything correctly. Now when I run WordCount.java, I do not see anything in the part-00000 file inside the output directory.
Java Code for Mapper:
import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Writable; import org.apache.hadoop.io.WritableComparable; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reporter; public class WordCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(WritableComparable key, Writable value, OutputCollector output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line.toLowerCase()); while(itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } @Override public void map(LongWritable arg0, Text arg1, OutputCollector<Text, IntWritable> arg2, Reporter arg3) throws IOException {
Abbreviation Code:
public class WordCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) {
Code for Word counter:
public class Counter { public static void main(String[] args) { JobClient client = new JobClient(); JobConf conf = new JobConf(com.example.Counter.class);
In the console, I get these logs:
13/09/10 10:09:20 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/09/10 10:09:20 INFO mapred.FileInputFormat: Total input paths to process : 3 13/09/10 10:09:20 INFO mapred.FileInputFormat: Total input paths to process : 3 13/09/10 10:09:20 INFO mapred.JobClient: Running job: job_201309100855_0012 13/09/10 10:09:21 INFO mapred.JobClient: map 0% reduce 0% 13/09/10 10:09:25 INFO mapred.JobClient: map 25% reduce 0% 13/09/10 10:09:26 INFO mapred.JobClient: map 75% reduce 0% 13/09/10 10:09:27 INFO mapred.JobClient: map 100% reduce 0% 13/09/10 10:09:35 INFO mapred.JobClient: Job complete: job_201309100855_0012 13/09/10 10:09:35 INFO mapred.JobClient: Counters: 15 13/09/10 10:09:35 INFO mapred.JobClient: File Systems 13/09/10 10:09:35 INFO mapred.JobClient: HDFS bytes read=54049 13/09/10 10:09:35 INFO mapred.JobClient: Local bytes read=14 13/09/10 10:09:35 INFO mapred.JobClient: Local bytes written=214 13/09/10 10:09:35 INFO mapred.JobClient: Job Counters 13/09/10 10:09:35 INFO mapred.JobClient: Launched reduce tasks=1 13/09/10 10:09:35 INFO mapred.JobClient: Launched map tasks=4 13/09/10 10:09:35 INFO mapred.JobClient: Data-local map tasks=4 13/09/10 10:09:35 INFO mapred.JobClient: Map-Reduce Framework 13/09/10 10:09:35 INFO mapred.JobClient: Reduce input groups=0 13/09/10 10:09:35 INFO mapred.JobClient: Combine output records=0 13/09/10 10:09:35 INFO mapred.JobClient: Map input records=326 13/09/10 10:09:35 INFO mapred.JobClient: Reduce output records=0 13/09/10 10:09:35 INFO mapred.JobClient: Map output bytes=0 13/09/10 10:09:35 INFO mapred.JobClient: Map input bytes=50752 13/09/10 10:09:35 INFO mapred.JobClient: Combine input records=0 13/09/10 10:09:35 INFO mapred.JobClient: Map output records=0 13/09/10 10:09:35 INFO mapred.JobClient: Reduce input records=0
I am new to Hadoop.
Please respond with an appropriate response.
Thanks.