Mood Analysis Training Data

Where can I get a corpus of documents that have already been classified as positive / negative for moods in the corporate domain? I need a large body of documents that provides reviews for companies, such as company reviews provided by analysts and the media.

I find those cases that have product and movie reviews. Is there a case for a business domain, including company reviews that match the language of the business?

+46
machine-learning nlp sentiment-analysis training-data text-analysis
Sep 26 '11 at 6:18
source share
6 answers

http://www.cs.cornell.edu/home/llee/data/

http://mpqa.cs.pitt.edu/corpora/mpqa_corpus

You can use twitter with its smiles, for example: http://web.archive.org/web/20111119181304/http://deepthoughtinc.com/wp-content/uploads/2011/01/Twitter-as-a-Corpus- for-Sentiment-Analysis-and-Opinion-Mining.pdf

Hope you get started. There is more to the literature if you are interested in specific subtasks, such as denial, the realm of feelings, etc.

To focus on companies, you can associate the method with topic discovery, or cheaply just a lot of mentions about a given company. Or you can get your data annotated with mechanical turkers.

+31
Sep 26 '11 at 12:20
source share

This is a list I wrote a few weeks ago from my blog . Some of these datasets have recently been included in the Python NLTK platform.

dictionaries




Datasets




Literature:

+13
Oct 19 '15 at 13:30
source share
+10
Sep 26 '11 at 16:53
source share

If you have resources (media channels, blogs, etc.) about the domain that you want to explore, you can create your own enclosure. I do this in python:

  • using Beautiful Soup http://www.crummy.com/software/BeautifulSoup/ to analyze the content that I want to classify.
  • separate these offers from positive / negative opinions about companies.
  • Use NLTK to process these sentences, tokenize words, mark POS, etc.
  • Use PMI NLTK to calculate bitrams or mos trigrams often in the same class

Creating corpus is a difficult job of preprocessing, checking, marking, etc., but it has advantages in preparing a model for a specific domain many times, increasing accuracy. If you can get an already prepared case, just continue to analyze the mood;)

+4
Mar 07 2018-12-12T00:
source share

I don’t know which such package is available, but you can try an uncontrolled method for an unlabeled dataset.

+1
Sep 26 '11 at 8:40
source share

You can get a large selection of online reviews from Datafiniti. Most reviews come with rating data, which would provide more granular mood than positive / negative. Here is a list of enterprises with reviews , and here is a list of products with reviews .

0
Jun 20 '13 at 19:46
source share



All Articles