Writing data to MySQL from Hadoop Reducer

Question

Writing data to MySQL from Hadoop Reducer

I am experimenting with Hadoop Map-Reduce, and in my tests I can store the output of reducers in HBase. However, I want to write data to mysql database instead of HBase. Mappers will still read their input from HBase. I found this, but it requires using MySQL both in the input and the output, while I only need it when I exit. In addition, the above uses some obsolete classes from the org.apache.hadoop.mapred package, for which a new org.apache.hadoop.mapreduce package is available now, however so far I can not find any tutorial using this new package.

+4

mysql hadoop

vikas Dec 6 '11 at 14:38

source share

1 answer

Praveen sripati · Answer 1 · 2011-12-06T16:34:23+0000

I found this, but for this I need to use MySQL both at the input and at the output, while I need it only at the exit.

InputFormat (DBInputFormat) is independent of OutputFormat (DBOutputFormat). It should be possible to read HBase in Mapper and write to DB in Reducer.

Using the new MR API, set Job # setInputFormat and Job # setOutputFormat, and the old MR API will correctly set JobConf # setInputFormat and JobConf # setOutputFormat to what I / O format is required. Both of these formats do not have to be the same. It should be possible to read from XML in the mapper and, if necessary, write to the queue in the reducer.

In addition, the link above uses some obsolete classes from the org.apache.hadoop.mapred package, for which the new org.apache.hadoop.mapreduce package is now available, however I cannot find any tutorial using this new package until now.

If you like the old API, use it. There is not much difference in the functionality of the new and old APIs. There are two DBInputFormat for the old and the new API. Make sure you do not mix old / new InputFormats with the old / new MR API.

Below is a tutorial on the new API.

Writing data to MySQL from Hadoop Reducer

More articles: