S3distcp srcPattern not working?

I have files like this in S3:

1-2013-08-22-22-something 2-2013-08-22-22-something etc 

without srcPattern I can easily get all the files from the bucket, but I want to get a specific prefix, for example, all 1. I tried using srcPattern, but for some reason it did not collect any of the files.

My current command is:

 elastic-mapreduce --jobflow $JOBFLOW --jar /home/hadoop/lib/emr-s3distcp-1.0.jar \ --args '--src,s3n://some-bucket/,--dest,hdfs:///hdfs-input,--srcPattern,[0-9]-.*' \ --step-name "copying over s3 files" 
+4
source share
1 answer

Turns out you need to. * before regex

for example i need

 .*[0-9]-.* 

I assume the source template also contains the name of the bucket?

+5
source

Source: https://habr.com/ru/post/1498763/


All Articles