How can I get a subset (say, 100 MB) of Wikipedia pages? I found that you can get the entire dataset in XML format, but this is more like 1 or 2 gigs; I do not need it.
I want to experiment with the implementation of the map reduction algorithm.
Having said that, if I could just find text data from 100 megagrams from anywhere, that would be nice too. For example. the database, if available, can be of a good size. I am open to suggestions.
Edit: Any that aren't torrents? I can not get them to work.
source
share