As already mentioned, setup() and cleanup() are methods that you can override if you choose, and they are designed to initialize and clean up map / shortcut tasks. In fact, you do not have access to any data from the input splitting directly at these stages. The life cycle of the map / reduce task (from the point of view of the programmer):
setup β map β cleanup
setting β decrease β cleaning
What usually happens during setup() is that you can read the parameters from the configuration object to configure the processing logic.
What usually happens during cleanup() is that you clean up all the resources that you have allocated. There are other uses that should discard any accumulation of cumulative results.
The setup() and cleanup() methods are just βhooksβ for you, the developer / programmer, to be able to do something before and after your map / reduce tasks.
For example, in the example of canonical word count, you can say that you want to exclude certain words from the count (for example, stop words such as "the", "a", "be", etc.). When you set up your MapReduce task, you can pass a list (comma-delimited) of these words as a parameter (a pair of key values) to the configuration object. Then in your map code during setup() you can get stop words and save them in some global variable (global variable for the map task) and exclude the counting of these words during your map logic. The following is a modified example of http://wiki.apache.org/hadoop/WordCount .
public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); private Set<String> stopWords; protected void setup(Context context) throws IOException, InterruptedException { Configuration conf = context.getConfiguration(); stopWords = new HashSet<String>(); for(String word : conf.get("stop.words").split(",")) { stopWords.add(word); } } public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { String token = tokenizer.nextToken(); if(stopWords.contains(token)) { continue; } word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); conf.set("stop.words", "the, a, an, be, but, can"); Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } }