I want to make testing and desktop marking a little easier for full-text search a little, so the data set should have the following qualities:
- 10,000 - 100,000 records.
- good variance of English words.
- In CSV or Excel format - i.e. I do not want to access it through the API.
Something like books or films with titles and description fields would be ideal. I looked at the UCI Machine Learning Repo, but it was too number oriented.
source
share