Natural Language / Text Mining and Reddit / Social News Site

I think there is a lot of natural language data associated with sites like reddit or digg or news.google.com.

I did a little research with text mining, but can't find how to use these tools to parse something like reddit.

What applications can you come up with?

+3
source share
3 answers

In the past, I have found that the best way to get data on sites like Reddit or Digg is to first use the developer API they provide. As a rule, you have a focused interest in either a topic or a trend, and the only way to get this data is through the established public interface. You can also analyze feeds and combine them to reveal 90% of what you would like to know. If you want to deeply explore data not accessible through the API, then you should be prepared to spend a considerable amount of time creating custom wrappers around a tool such as cURL. If you have a budget, you can also call them and ask if they offer paid research data for users.

+3
source

RSS, Nutch; - .

+1

These are good ideas. I can get data, but what applications can be built around it?

0
source

Source: https://habr.com/ru/post/1698609/


All Articles