Map Reduce output to CSV or do I need key values?

My map function creates

Key\tValue

Value = List (value1, value2, value3)

then my reduction function creates:

Key\tCSV-Line

Ref.


2323232-2322 fdsfs, sdfs, dfsfs, 0,0,0,2, fsda, 3,23,3, s,

2323555-22222 dfasd, sdfas, adfs, 0,0,2,0, fasafa, 2,23, s


Ex. Raw data: 232342|@ 3423@ |34343|sfasdfasdF|433443|Sfasfdas|324343 x 1000

In any case, I want to exclude the key at the beginning of this, so my client can do a direct import to mysql. I have about 50 data files, my question is, after it displays them once and the reducer starts, do I need to print the key with the value or can I just print the value?


Additional Information:

Here, this code can better illuminate the situation.

http://pastebin.ca/2410217

This is what I plan to do.

+6
source share
2 answers

Your reducer can emit a string without \ t, or, in your case, just what you call a value. Unfortunately, ifoop streaming interprets this as a key with a null value and automatically adds a delimiter (\ t by default) to the end of each line. You can change what this separator is, but when I played with it, I could not get it to not add a separator. I don’t remember the exact data, but based on this ( Hadoop: key and value are separated by tabs in the output file. How to do this with comma delimiters? ) I assume that the property is mapred.textoutputformat.separator. My solution was to split \ t at the end of each line when I pulled the file back:

 hadoop fs -cat hadoopfile | perl -pe 's/\t$//' > destfile 
+2
source

If you do not want to allocate a key, set it to NullWritable in your code. For instance:

 public static class TokenCounterReducer extends Reducer<Text, IntWritable, NullWritable, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); } context.write(NullWritable.get(), new IntWritable(sum)); // context.write(key, new IntWritable(sum)); } 

Let me know if this is not what you need, I will update the response accordingly.

+12
source

Source: https://habr.com/ru/post/948174/


All Articles