How to get input file name in MRjob

I am writing a map function using mrjob. My entry will be obtained from files in the directory on HDFS. File names contain small but important piece information that is missing from the files. Is there a way to find out (inside the map function) the name of the input file from which this key pair of keys comes from?

I am looking for the equivalent of this Java code:

FileSplit fileSplit = (FileSplit)reporter.getInputSplit(); String fileName = fileSplit.getPath().getName(); 

Thanks in advance!

+6
source share
2 answers

map.input.file property will provide the name of the input file.

According to Hadoop - The Ultimate Guide

Properties can be accessed from the job configuration obtained in the old MapReduce API by providing an implementation of the configure () method for Mapper or Reducer, where the configuration is passed as an argument. In the new API, these properties can be accessed from the context object passed to all the Mapper or Reducer methods.

+6
source

If you are using HADOOP 2.x with Python:

 file_name = os.environ['mapreduce_map_input_file'] 
+5
source

Source: https://habr.com/ru/post/920230/


All Articles