How to get input file name in MRjob

Question

How to get input file name in MRjob

I am writing a map function using mrjob. My entry will be obtained from files in the directory on HDFS. File names contain small but important piece information that is missing from the files. Is there a way to find out (inside the map function) the name of the input file from which this key pair of keys comes from?

I am looking for the equivalent of this Java code:

FileSplit fileSplit = (FileSplit)reporter.getInputSplit(); String fileName = fileSplit.getPath().getName();

Thanks in advance!

+6

python hadoop hadoop-streaming mrjob

Bolo Jul 11 '12 at 14:26

source share

2 answers

If you are using HADOOP 2.x with Python:

 file_name = os.environ['mapreduce_map_input_file']

+5

Teodor-bogdan barbieru Jun 26 '14 at 14:52

source share

Praveen sripati · Accepted Answer · 2012-07-11T17:06:13+0000

map.input.file property will provide the name of the input file.

According to Hadoop - The Ultimate Guide

Properties can be accessed from the job configuration obtained in the old MapReduce API by providing an implementation of the configure () method for Mapper or Reducer, where the configuration is passed as an argument. In the new API, these properties can be accessed from the context object passed to all the Mapper or Reducer methods.

How to get input file name in MRjob

More articles: