I am doing twitter sentiment analysis using python NLTK. I need a dictionary that contains + ve and -ve polarity words. I read so much about sentiwordnet, but when I use it for my project, it does not give effective and quick results. I think that I am not using it correctly. Can someone tell me the correct way to use it? Here are the steps I have taken so far:
- tokenization of tweets
- Token POS Marking
- passing each tag to sentinet
I use the nltk package for tokenization and tagging. See part of my code below:
import nltk
from nltk.stem import *
from nltk.corpus import sentiwordnet as swn
tokens=nltk.word_tokenize(row)
tagged=nltk.pos_tag(tokens)
for i in range(0,len(tagged)):
if 'NN' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'n'))>0:
pscore+=(list(swn.senti_synsets(tagged[i][0],'n'))[0]).pos_score()
nscore+=(list(swn.senti_synsets(tagged[i][0],'n'))[0]).neg_score()
elif 'VB' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'v'))>0:
pscore+=(list(swn.senti_synsets(tagged[i][0],'v'))[0]).pos_score()
nscore+=(list(swn.senti_synsets(tagged[i][0],'v'))[0]).neg_score()
elif 'JJ' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'a'))>0:
pscore+=(list(swn.senti_synsets(tagged[i][0],'a'))[0]).pos_score()
nscore+=(list(swn.senti_synsets(tagged[i][0],'a'))[0]).neg_score()
elif 'RB' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'r'))>0:
pscore+=(list(swn.senti_synsets(tagged[i][0],'r'))[0]).pos_score()
nscore+=(list(swn.senti_synsets(tagged[i][0],'r'))[0]).neg_score()
In the end, I will calculate how many tweets are positive and how many tweets are negative. Where am I mistaken? How to use it? And is there any other similar dictionary that is easy to use?