A huge amount of text data for parsing

Question

A huge amount of text data for parsing

I am developing a ruby parser that parses some uneven text data. Can someone tell me where I can get a lot of open text data?

+6

dataset plaintext

user724707 Apr 26 '11 at 3:53

source share

2 answers

You can copy Wikipedia (or just run a bunch of it through lynx -dump ). It will also give you an extensive source of non-English text. The Gutenberg project will be another good source of a lot of plain text.

+4

mu is too short Apr 26 '11 at 4:01

source share

intellidiot · Accepted Answer · 2011-04-26T03:54:16+0000

Here you will get a list of many:

http://www.quora.com/Data/Where-can-I-get-large-datasets-open-to-the-public

And my fav:

http://ftp.sunet.se/mirror/archive/ftp.sunet.se/pub/tv+movies/imdb/

A huge amount of text data for parsing

More articles: