I found this, but for this I need to use MySQL both at the input and at the output, while I need it only at the exit.
InputFormat (DBInputFormat) is independent of OutputFormat (DBOutputFormat). It should be possible to read HBase in Mapper and write to DB in Reducer.
Using the new MR API, set Job # setInputFormat and Job # setOutputFormat, and the old MR API will correctly set JobConf # setInputFormat and JobConf # setOutputFormat to what I / O format is required. Both of these formats do not have to be the same. It should be possible to read from XML in the mapper and, if necessary, write to the queue in the reducer.
In addition, the link above uses some obsolete classes from the org.apache.hadoop.mapred package, for which the new org.apache.hadoop.mapreduce package is now available, however I cannot find any tutorial using this new package until now.
If you like the old API, use it. There is not much difference in the functionality of the new and old APIs. There are two DBInputFormat for the old and the new API. Make sure you do not mix old / new InputFormats with the old / new MR API.
Below is a tutorial on the new API.
source share