How to use Elastic MapReduce to run XSLT conversion to millions of small S3 xml files?

Question

In particular, is there a somewhat simple streaming solution?

+3

zack Aug 11 '10 at 0:52

1 answer

Ryan Cox · Answer 1 · 2010-08-11T11:44:16+0000

Upload your data to the S3 bucket
Generate a file containing the full path s3n: // to each file
Write a script mapper that:
- Print 'mapred_work_output_dir' from the environment (*)
- Performs XSLT conversion based on file name, saving to output directory
Write a registry that does nothing
Load the map / gear scripts into the S3 bucket.
script AWS EMR

(*) workconf . . .