How to use Elastic MapReduce to run XSLT conversion to millions of small S3 xml files?

In particular, is there a somewhat simple streaming solution?

+3
source share
1 answer

See this link: How to process files, one per card?

  • Upload your data to the S3 bucket
  • Generate a file containing the full path s3n: // to each file
  • Write a script mapper that:
    • Print 'mapred_work_output_dir' from the environment (*)
    • Performs XSLT conversion based on file name, saving to output directory
  • Write a registry that does nothing
  • Load the map / gear scripts into the S3 bucket.
  • script AWS EMR

(*) workconf . . .

+2

Source: https://habr.com/ru/post/1759042/