The Wikimedia Foundation has just released an InputReader for the Hadoop Streaming interface, which can read compressed bz2 dump files and send them to your cartographers. The device sent to the cartup is not a whole page, but two revisions (so you can actually run diff for two versions). This is the initial release, and I'm sure there will be some bugs, but please give it a spin and help us check it out.
This InputReader requires Hadoop 0.21, because Hadoop 0.21 supports streaming bz2 files. Source code is available at: https://github.com/whym/wikihadoop
source share