I am trying to start work on Elastic MapReduce (EMR) using a special jar. I am trying to process about 1000 files in one directory. When I submit my work with the s3n://bucketname/compressed/*.xml.gz , I get a "matching 0 files" error. If I transfer only the absolute path to the file (for example, s3n://bucketname/compressed/00001.xml.gz ), it works fine, but only one file is processed. I tried using the directory name ( s3n://bucketname/compressed/ ), hoping that the files inside would be processed, but that just passes the directory to the job.
At the same time, I have a small local installation. In this case, when I transfer my work using wildcards ( /path/to/dir/on/hdfs/*.xml.gz ), it works fine, and all 1000 files are listed correctly.
How do I get EMR to display all my files?
source share