Write to multiple exits using the Scalding Hadoop key, one MapReduce Job

How can you write to multiple key-dependent outputs using Scalding (/ cascading) in a single map reduction job. Of course, I could use .filter for all possible keys, but this is a terrible hack that will run many jobs.

+6
scala mapreduce hadoop cascading scalding
Jun 02 '14 at 12:16
source share
3 answers

Scalding (from version 0.9.0rc16 and higher) has TemplatedTsv , just like Cascading TemplateTsv.

 Tsv(args("input"), ('COUNTRY, 'GDP)) .read .write(TemplatedTsv(args("output"), "%s", 'COUNTRY)) // it will create a directory for each country under "output" path in Hadoop mode. 
+6
Jun 25 '14 at 12:04 on
source share

Use MultipleOutputFormat and extrapolate from these other SO questions to write a custom output class using the output format: Create a Scalding Source, for example TextLine, which merges several files into separate cards , Compress Scalding / Cascading TsvCompressed output

0
Jun 02
source share

This proposal in the Cascading User group suggests using the Cascading TemplateTap . Not sure how to connect this to Scalding.

0
Jun 02 '14 at 18:27
source share



All Articles